A total of 19 metadata variables were imported from the sample sheet for this sample: -
A total of 737,280 barcodes were submitted to the ambient RNA / empty droplet identification method emptyDrops. This identified 6,582 cells and 730,698 empty droplets with an FDR cutoff of ≤ 0.10%.
Figure: Negative log-likelihood of barcodes in the multinomial model of EmptyDrops against total counts. Barcodes detected as putative cell-containing droplets at an FDR of 0.10% are marked in blue. Only barcodes with total_counts greater than the lower threshold (100 total counts) are shown.
The EmptyDrops algorithm was run with the specified 10,000 Monte-Carlo iterations for the calculation of p-values. No barcodes were observed to have p-values limited by the number of iterations, suggesting that 10,000 iterations was suitable for cell / empty droplet calling on this data.
To evaluate whether the model was reliable, the p-values for all presumed ambient barcodes (i.e. total_counts <100) were extracted. Under the null hypothesis, the p-values for these barcodes should be uniformly distributed.
Figure: Distribution of p-values for low total-count barcodes. Ideally this distribution should be close to uniform. Large peaks near zero would suggest that barcodes with fewer total_counts than the lower parameter (lower = 100) are not all ambient. In this case, further decreasing the lower parameter is recommended to ensure that barcodes representing genuine cells with relatively low expression are not used to estimate the ambient profile.
In addition to this visualization, the uniformity was tested using the Kolmogorov-Smirnov test for uniformity which yielded a p-value of 0.087 (≤0.05 would suggest non-uniformity).
Figure: Barcode count depth rank plot. The ‘elbow’ indicates where count depth decreases rapidly (relative increase in background counts), and can be used to inform the count depth threshold. The applied lower-limit counts threshold is indicated at 100 counts (red line).
Figure: Histogram of count depth per cell. A lower-limit threshold of 100 was applied (red line).
Figure: Histogram of number of genes per cell. A lower-limit threshold of 100 was applied (red line).
Figure: Number of genes versus count depth coloured by relative mitochondrial counts. The count-depth threshold of 100 counts and the number of genes threshold of 100 genes are indicated with vertical and horizontal red lines, respectively. Cells with high mitochondrial counts are typically in cells with relatively lower count depth. Cells with fractional mitochondrial counts higher than 0.1 (i.e. 10.00%) were filtered.
Figure: Histogram of mitochondrial fraction per cell. A upper-threshold of 0.1 (i.e. 10.00%) maximum mitochondrial fraction was applied (red line).
Figure: Histogram of ribosomal fraction per cell. A upper-threshold of 1 (i.e. 100.00%) maximum ribosomal fraction was applied (red line).
A total of 6581 cells which passed QC were submitted to the multiplet identification algorithm “doubletfinder”. This identified 346 multiplets and 6235 singlets. To identify these, the variable(s) "" were first regressed out of the data. The first 10 principal components were used to identify the 2000 most variable genes. An assumed doublet formation rate of 0.052648 (i.e. 5.26% ) was applied.
The 346 multiplets identified by “doubletfinder” are visualized below in red in PCA space, tSNE space, and UMAP space.
PCA
tSNE
UMAP
Lun, Aaron T. L., Riesenfeld, Samantha, Andrews, Tallulah, Dao, The Phuong, Gomes, Tomas, Marioni, John C. (2019). EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biology. 20(1), 63. [DOI]
McGinnis, Christopher S., Murrow, Lyndsay M., Gartner, Zev J. (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems. 8(4), 329–337.e4. [DOI]
Stuart, Tim, Butler, Andrew, Hoffman, Paul, Hafemeister, Christoph, Papalexi, Efthymia, Mauck, William M., Hao, Yuhan, Stoeckius, Marlon, Smibert, Peter, Satija, Rahul (2019). Comprehensive Integration of Single-Cell Data. Cell. 177(7), 1888–1902.e21. [DOI]
scFlow v0.5.0 – 2020-05-15 03:00:45
A report by scFlow